Instance Segmentation Mask R-CNN 🇮🇹

Overview ed esempio di utilizzo

07_instance_segmentation

Instance Segmentation

alt text

Ci sono diversi modelli allo stato dell'arte che permettono di fare instance segmentation, noi ci soffermeremo su un modello particolarmente effettivo e che viene usato molto frequentemente: le Mask R-CNN progettate da Facebook AI nel 2017.

Tutorial by Francesco Pelosin @ Ca' Foscari University

Mask R-CNN in a nutshell

Il task di segmentazione semantica richiede la rilevazione degli oggetti nella scena e la loro segmentazione, di fatto le Mask R-CNN risolvono il problema con un approccio bottom up.

L'architettura infatti è composta da:

  • Una rete backbone che funge da object detector (Faster R-CNN)
  • Una rete on top che adopera image segmentation (Fully Convolutional Network)

alt text

Creazione Modello

Grazie al modulo torchvision di PyTorch possiamo usufruire di diversi modelli pretrainati (ossia allenati su un particolare dataset).

In [ ]:
import torchvision                        # torchvision è usato per accedere ai modelli
from torchvision import transforms as T   # da torchvision prendiamo transforms per adoperare trasformazioni sulle immagini
import cv2                                # cv2 (OpenCV) è una libreria trasversale di algoritmi di computer vision 
import matplotlib.pyplot as plt           # libreria per disegnare "plottare" grafici
import numpy as np                        # numpy è una libreria pensata per il calcolo matriciale scientifico
In [ ]:
# Download del modello, scarichiamo già i pesi pretrainati sul COCO dataset.
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

# Settiamo il modello in "eval" mode in quanto il comportamento durante il training differisce da quello di evaluation
model.eval()
Out[ ]:
MaskRCNN(
  (transform): GeneralizedRCNNTransform()
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
      )
      (layer2): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
      )
      (layer3): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
        (3): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
        (4): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
        (5): Bottleneck(
          (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
      )
      (layer4): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          )
        )
        (1): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
        (2): Bottleneck(
          (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(original_name=FrozenBatchNorm2d)
          (relu): ReLU(inplace=True)
        )
      )
    )
    (fpn): FeaturePyramidNetwork(
      (inner_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
        (3): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
      )
      (layer_blocks): ModuleList(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
      (extra_blocks): LastLevelMaxPool()
    )
  )
  (rpn): RegionProposalNetwork(
    (anchor_generator): AnchorGenerator()
    (head): RPNHead(
      (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (cls_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
      (bbox_pred): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
    )
  )
  (roi_heads): RoIHeads(
    (box_roi_pool): MultiScaleRoIAlign()
    (box_head): TwoMLPHead(
      (fc6): Linear(in_features=12544, out_features=1024, bias=True)
      (fc7): Linear(in_features=1024, out_features=1024, bias=True)
    )
    (box_predictor): FastRCNNPredictor(
      (cls_score): Linear(in_features=1024, out_features=91, bias=True)
      (bbox_pred): Linear(in_features=1024, out_features=364, bias=True)
    )
    (mask_roi_pool): MultiScaleRoIAlign()
    (mask_head): MaskRCNNHeads(
      (mask_fcn1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (relu1): ReLU(inplace=True)
      (mask_fcn2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (relu2): ReLU(inplace=True)
      (mask_fcn3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (relu3): ReLU(inplace=True)
      (mask_fcn4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (relu4): ReLU(inplace=True)
    )
    (mask_predictor): MaskRCNNPredictor(
      (conv5_mask): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2))
      (relu): ReLU(inplace=True)
      (mask_fcn_logits): Conv2d(256, 91, kernel_size=(1, 1), stride=(1, 1))
    )
  )
)

Utilities di visualizzazione ed accesso

In [ ]:
# Di seguito riportiamo le etichette degli oggetti riconoscibili dalla rete pretrainata sul COCO dataset
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]


def get_prediction(img_path, threshold):
  """
  Questa funzione ritorna le maschere, le bounding boxes e le classi predette
  per una certa immagine ed una confidenza (threshold) da passare al modello.
  """
  # Lettura immagine
  img = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

  # Trasformazione in tensore
  transform = T.Compose([T.ToTensor()])
  img = transform(img)

  # Feed dell'immagine alla rete
  pred = model([img])

  # Confidenza della rete, tieni solo le predizioni sopra una certa soglia (0-1)
  pred_score = list(pred[0]['scores'].detach().numpy())
  pred_t = [pred_score.index(x) for x in pred_score if x>threshold][-1]

  # Mantieni la predizione della maschera sopra una certa soglia di confidenza 0.5 (0-1)
  masks = (pred[0]['masks']>0.5).squeeze().detach().cpu().numpy()

  # Estrapoliamo il nome della classe
  pred_class = [COCO_INSTANCE_CATEGORY_NAMES[i] for i in list(pred[0]['labels'].numpy())]

  # Estrapoliamo la bounding box
  pred_boxes = [[(i[0], i[1]), (i[2], i[3])] for i in list(pred[0]['boxes'].detach().numpy())]

  # Shrink delle strutture per matching della predizione
  masks = masks[:pred_t+1]
  pred_boxes = pred_boxes[:pred_t+1]
  pred_class = pred_class[:pred_t+1]
  
  return masks, pred_boxes, pred_class


def random_colour_masks(image):
  """
  Questa funzione genera una maschera con un colore random 
  """
  # Colori codificati in RGB
  colours = [[0, 255, 0],[0, 0, 255],[255, 0, 0],[0, 255, 255],[255, 255, 0],[255, 0, 255],[80, 70, 180],[250, 80, 190],[245, 145, 50],[70, 150, 250],[50, 190, 190]]
  
  # Generazione dei canali RGB della maschera
  r = np.zeros_like(image).astype(np.uint8)
  g = np.zeros_like(image).astype(np.uint8)
  b = np.zeros_like(image).astype(np.uint8)
  
  # Setting del colore in base alla maschera
  r[image == 1], g[image == 1], b[image == 1] = colours[np.random.randint(0,10)]
  
  # Stack dei layers RGB
  coloured_mask = np.stack([r, g, b], axis=2)
  
  return coloured_mask


def instance_segmentation_api(img_path, threshold=0.5, rect_th=3, text_size=3, text_th=3):
  """
  Questa funzione permette di chiamare il modello, avere le predizioni e 
  generare un overlay di visualizzazione delle predizioni
  """
  # Ritorniamo le predizioni
  masks, boxes, pred_cls = get_prediction(img_path, threshold)
  
  # Lettura dell'immagine e generazione delle bounding boxes e maschere
  img = cv2.imread(img_path)
  img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  for i in range(len(masks)):
    rgb_mask = random_colour_masks(masks[i])
    img = cv2.addWeighted(img, 1, rgb_mask, 0.5, 0)
    cv2.rectangle(img, boxes[i][0], boxes[i][1],color=(0, 255, 0), thickness=rect_th)
    cv2.putText(img,pred_cls[i], boxes[i][0], cv2.FONT_HERSHEY_SIMPLEX, text_size, (0,255,0),thickness=text_th)
  
  # Visualizzazione dell'immagine
  plt.figure(figsize=(20,30))
  plt.imshow(img)
  plt.axis('off')
  plt.show()

Download immagine

In [ ]:
!wget https://images-na.ssl-images-amazon.com/images/I/A1ppzg2gLwL._SL1500_.jpg -O btles.jpg

img = cv2.imread('./btles.jpg', cv2.IMREAD_UNCHANGED)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(20,30))
plt.imshow(img)
plt.axis('off')
plt.show()
--2020-01-15 20:54:03--  https://images-na.ssl-images-amazon.com/images/I/A1ppzg2gLwL._SL1500_.jpg
Resolving images-na.ssl-images-amazon.com (images-na.ssl-images-amazon.com)... 54.192.150.11
Connecting to images-na.ssl-images-amazon.com (images-na.ssl-images-amazon.com)|54.192.150.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 415472 (406K) [image/jpeg]
Saving to: ‘btles.jpg’

btles.jpg           100%[===================>] 405.73K  --.-KB/s    in 0.007s  

2020-01-15 20:54:03 (59.9 MB/s) - ‘btles.jpg’ saved [415472/415472]

Get predictions!

In [ ]:
instance_segmentation_api('./btles.jpg', threshold=0.8, text_size=1, text_th=2, rect_th=2)

Hands on LAB 👩‍💻

Challenge 1: Teletrasporto spaziale 🌌

Siete pronti? In questo hands-on lab prenderemo una foto che ritrae una persona e cercheremo di segmentarla attraverso il nostro modello. Dopodichè scaricheremo una foto dello spazio e cercheremo di telegrasportare la persona direttamente nello spazio.

image.png

Scarica una foto se non possiedi una webcam

In [ ]:
!wget https://www.agenpress.it/wp-content/uploads/2019/10/AP_19201004713022-1000x667.jpg -O photo.jpg
--2020-01-16 16:50:37--  https://www.agenpress.it/wp-content/uploads/2019/10/AP_19201004713022-1000x667.jpg
Resolving www.agenpress.it (www.agenpress.it)... 89.46.108.12
Connecting to www.agenpress.it (www.agenpress.it)|89.46.108.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 89329 (87K) [image/jpeg]
Saving to: ‘photo.jpg’

photo.jpg           100%[===================>]  87.24K   551KB/s    in 0.2s    

2020-01-16 16:50:39 (551 KB/s) - ‘photo.jpg’ saved [89329/89329]

[ Opzionale ] Webcam

Se avete una webcam scattate una foto di voi stessi, altrimenti salta alla cella successiva.

In [ ]:
#@title 👈 Cattura


from IPython.display import display, Javascript
from google.colab.output import eval_js
from base64 import b64decode

def take_photo(filename='photo.jpg', quality=0.8):
  js = Javascript('''
    async function takePhoto(quality) {
      const div = document.createElement('div');
      const capture = document.createElement('button');
      capture.textContent = 'Cattura';
      div.appendChild(capture);

      const video = document.createElement('video');
      video.style.display = 'block';
      const stream = await navigator.mediaDevices.getUserMedia({video: true});

      document.body.appendChild(div);
      div.appendChild(video);
      video.srcObject = stream;
      await video.play();

      // Resize the output to fit the video element.
      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);

      // Wait for Capture to be clicked.
      await new Promise((resolve) => capture.onclick = resolve);

      const canvas = document.createElement('canvas');
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);
      stream.getVideoTracks()[0].stop();
      div.remove();
      return canvas.toDataURL('image/jpeg', quality);
    }
    ''')
  display(js)
  data = eval_js('takePhoto({})'.format(quality))
  binary = b64decode(data.split(',')[1])
  with open(filename, 'wb') as f:
    f.write(binary)
  return filename

from IPython.display import Image
try:
  filename = take_photo()
  print('Saved to {}'.format(filename))
  
  # Show the image which was just taken.
  display(Image(filename))
except Exception as err:
  # Errors will be thrown if the user does not have a webcam or if they do not
  # grant the page permission to access it.
  print(str(err))
Saved to photo.jpg

Visualizziamo l'immagine

In [ ]:
import cv2
import matplotlib.pyplot as plt

# Prendiamo la foto appena scattata e invertiamo i canali da BGR a RGB
img = cv2.imread("./photo.jpg", cv2.IMREAD_UNCHANGED)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# Visualizziamo l'immagine
plt.figure(figsize=(10,10))
plt.imshow(img)
plt.axis('off')
plt.show()

Otteniamo la segmentazione

In [ ]:
import torchvision
import numpy as np

# Instanziamo il modello da torchvision
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

# Settiamo il modello in "eval" mode in quanto il comportamento durante il training differisce da quello di evaluation
model.eval()

# Trasformiamo l'immagine in tensore
image_tensor = torchvision.transforms.functional.to_tensor(img)

# Passiamo l'immagine alla rete
output = model([image_tensor])

# Recuperiamo la maschera...cosa succede se ci sono più persone? 
mask = output[0]['masks'].detach().numpy()[0,0,:]

# Teniamo solamente i pixel corrispondenti alla segmentazione
img[:,:,0] = img[:,:,0]*mask
img[:,:,1] = img[:,:,1]*mask
img[:,:,2] = img[:,:,2]*mask

# Visualizziamo l'immagine
plt.figure(figsize=(10,10))
plt.imshow(img)
plt.axis('off')
plt.show()

Background spaziale

In [ ]:
!wget https://thebuzzpaper.com/wp-content/uploads/2019/11/space-signals-3246.jpg -O bground.jpg
--2020-01-16 17:02:01--  https://thebuzzpaper.com/wp-content/uploads/2019/11/space-signals-3246.jpg
Resolving thebuzzpaper.com (thebuzzpaper.com)... 104.27.153.187, 104.27.152.187, 2606:4700:3034::681b:99bb, ...
Connecting to thebuzzpaper.com (thebuzzpaper.com)|104.27.153.187|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1041718 (1017K) [image/jpeg]
Saving to: ‘bground.jpg’

bground.jpg         100%[===================>]   1017K  --.-KB/s    in 0.03s   

2020-01-16 17:02:02 (28.8 MB/s) - ‘bground.jpg’ saved [1041718/1041718]

In [ ]:
# Prendiamo il background e invertiamo i canali da BGR a RGB
bground = cv2.imread("./bground.jpg", cv2.IMREAD_UNCHANGED)
bground = cv2.cvtColor(bground, cv2.COLOR_BGR2RGB)

# Resize dell'immagine all'immagine della persona
bground = cv2.resize(bground, (img.shape[1],img.shape[0]))


#----------------- Visualizziamo l'immagine di background ----------------------

# Visualizziamo l'immagine
plt.figure(figsize=(10,10))
plt.imshow(bground)
plt.axis('off')
plt.show()

#-------------------------------------------------------------------------------

Teletrasportiamoci!

In [ ]:
#------------ Riusciresti a teletrasportare la persona nello spazio? -----------
not_mask = mask < 0.5
not_mask = not_mask.astype(np.float)
bground[:,:,0] = bground[:,:,0]*not_mask
bground[:,:,1] = bground[:,:,1]*not_mask
bground[:,:,2] = bground[:,:,2]*not_mask

risultato = bground + img
#-------------------------------------------------------------------------------

# Visualizziamo il risultato
plt.figure(figsize=(10,10))
plt.imshow(risultato)
plt.axis('off')
plt.show()
Avatar
Francesco Pelosin
Ph.D. Student in Computer Science

My main interests are Computer Visioni/Pattern Recognition with particular focus on Unsupervised approaches

Related